Dimension Reduction Forests: Local Variable Importance Using Structured Random Forests
نویسندگان
چکیده
Random forests are one of the most popular machine learning methods due to their accuracy and variable importance assessment. However, random only provide in a global sense. There is an increasing need for such assessments at local level, motivated by applications personalized medicine, policy-making, bioinformatics. We propose new nonparametric estimator that pairs flexible forest kernel with sufficient dimension reduction adapt regression function’s structure. This allows us estimate meaningful directional measure each prediction point. develop computationally efficient fitting procedure conditions recovery splitting directions. demonstrate significant gains our proposed over competing on simulated real problems. Finally, we apply method seasonal particulate matter concentration data collected Beijing, China, which yields measures. The presented here available drforest Python package. Supplementary materials this article online.
منابع مشابه
Variable selection using random forests
This paper proposes, focusing on random forests, the increasingly used statistical method for classification and regression problems introduced by Leo Breiman in 2001, to investigate two classical issues of variable selection. The first one is to find important variables for interpretation and the second one is more restrictive and try to design a good prediction model. The main contribution is...
متن کاملVariable Selection Using Random Forests
One of the main topic in the development of predictive models is the identification of variables which are predictors of a given outcome. Automated model selection methods, such as backward or forward stepwise regression, are classical solutions to this problem, but are generally based on strong assumptions about the functional form of the model or the distribution of residuals. In this paper a...
متن کاملUsing Random Forests in the Structured Language Model
In this paper, we explore the use of Random Forests (RFs) in the structured language model (SLM), which uses rich syntactic information in predicting the next word based on words already seen. The goal in this work is to construct RFs by randomly growing Decision Trees (DTs) using syntactic information and investigate the performance of the SLM modeled by the RFs in automatic speech recognition...
متن کاملVSURF: An R Package for Variable Selection Using Random Forests
This paper describes the R package VSURF. Based on random forests, and for both regression and classification problems, it returns two subsets of variables. The first is a subset of important variables including some redundancy which can be relevant for interpretation, and the second one is a smaller subset corresponding to a model trying to avoid redundancy focusing more closely on the predict...
متن کاملVariable Selection in Time Series Forecasting Using Random Forests
Time series forecasting using machine learning algorithms has gained popularity recently. Random forest is a machine learning algorithm implemented in time series forecasting; however, most of its forecasting properties have remained unexplored. Here we focus on assessing the performance of random forests in one-step forecasting using two large datasets of short time series with the aim to sugg...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Journal of Computational and Graphical Statistics
سال: 2022
ISSN: ['1061-8600', '1537-2715']
DOI: https://doi.org/10.1080/10618600.2022.2069777